Notes:

  1. Datasets associated with this report are not included by default.
  2. Datasets and full codes can be provided upon request.

Step 01: Quick setup to load all necessary libraries and setting alpha value

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## Loading required package: lpSolve
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'

Step 02: Descriptive statistics for flow, missingness

## 
## PreS001 PreS002 PreS003 PreS004 PreS005 PreS006 PreS007 PreS008 PreS009 PreS010 
##       5       4       4       4       5       3       2       2       2       2 
##    <NA> 
##       1
## 
## PostS011 PostS012 PostS013 PostS014 PostS015 PostS016 PostS017 PostS018 
##        5        3        4        4        4        5        4        1 
##     <NA> 
##        4

Step 03: Descriptive statistics for pre & post speaking tests + change

The following is pre-post speaking change stats without p-value and effect size (to be reported separately).

The following is distribution of pre-post speaking test change by SPM English grade segments.

Step 04: Descriptive statistics for pre & post writing tests + change

The following is pre-post writing change stats without p-value and effect size (to be reported separately).

The following is distribution of pre-post writing test change by SPM English grade segments.

Step 05: Hypothesis-testing for H1a (parametric check pre-post gains, significance and effect size, robustness check)

H1a: Shapiro-Wilk test for speaking gains:
## 
##  Shapiro-Wilk normality test
## 
## data:  df_s$pre_post_s_gain
## W = 0.9434, p-value = 0.1123

t-test report:

## 
##  Paired t-test
## 
## data:  df_s_p$post_s_score and df_s_p$pre_s_score
## t = 5.5017, df = 29, p-value = 6.297e-06
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  1.863823 4.069511
## sample estimates:
## mean difference 
##        2.966667

Wilcoxon signed-rank test report:

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  df_s_p$post_s_score and df_s_p$pre_s_score
## V = 399, p-value = 8.953e-05
## alternative hypothesis: true location shift is not equal to 0

Effect sizes and 95% CI:

Summary of diff_s from df_s_p:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -3.500   0.625   3.500   2.967   5.000   7.500
##  [1] -3.5 -3.0 -2.0 -1.5  0.0  0.5  0.5  0.5  1.0  2.0  2.5  3.0  3.0  3.0  3.5
## [16]  3.5  4.0  4.5  4.5  4.5  5.0  5.0  5.0  5.5  5.5  5.5  6.0  6.5  7.0  7.5

A quick summary for everything:

Visualizing effect size:

Step 06: Hypothesis-testing for H1b (parametric check, pre-post gains, significance and effect size, robustness check)

H1b: Shapiro-Wilk test for writing gains:
## 
##  Shapiro-Wilk normality test
## 
## data:  df_w$pre_post_w_gain
## W = 0.97297, p-value = 0.6232

t-test report:

## 
##  Paired t-test
## 
## data:  df_w_p$post_w_score and df_w_p$pre_w_score
## t = 1.6437, df = 29, p-value = 0.111
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.2442578  2.2442578
## sample estimates:
## mean difference 
##               1

Wilcoxon signed-rank test report:

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  df_w_p$post_w_score and df_w_p$pre_w_score
## V = 210.5, p-value = 0.08497
## alternative hypothesis: true location shift is not equal to 0

Effect sizes and 95% CI:

Summary of diff_w from df_w_p:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -8.00   -0.75    1.00    1.00    3.00    9.00
##  [1] -8 -4 -3 -3 -2 -1 -1 -1  0  0  0  0  0  0  1  1  1  2  2  2  2  3  3  3  3
## [26]  5  5  5  6  9

A quick summary for everything:

Visualizing effect size:

Step 07: Hypothesis-testing for H2a (parametric check, pre-post speaking gains by SPM English grade segments, significance, and effect size)

H2a: Shapiro-Wilk test for normality of residuals:
Shapiro-Wilk normality test

data: resid(m_h2a) W = 0.98159, p-value = 0.866

H2a: Levenes Test for homogeneity of variances:
H2a: Checking Homogeneity of Regression Slopes:

Fitting a linear regression model to explain speaking score change using SPM English grade and pre-test score:

Call: lm(formula = gain_s ~ spm_eng_band + pre_s_score, data = df_s_p)

Residuals: Min 1Q Median 3Q Max -6.7027 -2.1393 0.1355 1.7128 4.9240

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.9204 2.9857 2.653 0.0134 * spm_eng_bandB-range -2.1319 1.6952 -1.258 0.2197
spm_eng_bandC/D-range -3.0911 1.8827 -1.642 0.1127
pre_s_score -0.2957 0.1954 -1.513 0.1422
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

Residual standard error: 2.956 on 26 degrees of freedom Multiple R-squared: 0.1021, Adjusted R-squared: -0.001514 F-statistic: 0.9854 on 3 and 26 DF, p-value: 0.415

Doing Type II ANOVA to check if SPM grade segment has overall effect on gains, while accounting for pre-test scores:

Checking effect size for each variate:

Checking which group(s) differ significantly, if there is/are any:

contrast estimate SE df t.ratio p.value (A-range) - (B-range) 2.132 1.70 26 1.258 0.4312 (A-range) - (C/D-range) 3.091 1.88 26 1.642 0.2466 (B-range) - (C/D-range) 0.959 1.41 26 0.680 0.7769

P value adjustment: tukey method for comparing a family of 3 estimates

Step 08: Hypothesis-testing for H2b (parametric check, pre-post writing gains by SPM English grade segments, significance, and effect size)

H2b: Shapiro-Wilk test for normality of residuals:
Shapiro-Wilk normality test

data: resid(m_h2b) W = 0.98223, p-value = 0.8813

H2b: Levenes Test for homogeneity of variances:
H2b: Checking Homogeneity of Regression Slopes:

Fitting a linear regression model to explain writing score change using SPM English grade and pre-test score:

Call: lm(formula = gain_w ~ spm_eng_band + pre_w_score, data = df_w_p)

Residuals: Min 1Q Median 3Q Max -5.6085 -1.9136 -0.0713 1.6185 5.4485

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.8955 2.3741 3.326 0.00263 spm_eng_bandB-range -5.2805 1.7340 -3.045 0.00527 spm_eng_bandC/D-range -5.3615 1.9268 -2.783 0.00991 ** pre_w_score -0.5563 0.2301 -2.417 0.02294 * — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1

Residual standard error: 2.974 on 26 degrees of freedom Multiple R-squared: 0.2856, Adjusted R-squared: 0.2032 F-statistic: 3.465 on 3 and 26 DF, p-value: 0.03057

Doing Type II ANOVA to check if SPM grade segment has overall effect on gains, while accounting for pre-test scores:

Checking effect size for each variate:

Checking which group(s) differ significantly, if there is/are any:

contrast estimate SE df t.ratio p.value (A-range) - (B-range) 5.281 1.73 26 3.045 0.0141 (A-range) - (C/D-range) 5.361 1.93 26 2.783 0.0259 (B-range) - (C/D-range) 0.081 1.41 26 0.057 0.9982

P value adjustment: tukey method for comparing a family of 3 estimates

Step 09: Hypothesis-testing for H3a (parametric check, association between padlet post quality and pre-post speaking change)

H3a: Shapiro-Wilk test for normality of residuals:
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(m_h3a)
## W = 0.98108, p-value = 0.8536
H3a: Non-Constant Variance (Breusch-Pagan) test:
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 2.589108, Df = 1, p = 0.1076
H3a: Variance Inflation Factor (VIF) check:
##                  GVIF Df GVIF^(1/(2*Df))
## padlet_mean  1.223071  1        1.105925
## pre_s_score  2.080424  1        1.442367
## spm_eng_band 2.248739  2        1.224573

Fitting a linear regression model to expain speaking score change using padlet score and SPM English grade:

## 
## Call:
## lm(formula = gain_s ~ padlet_mean + pre_s_score + spm_eng_band, 
##     data = df_s_p)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.7456 -2.2003  0.2613  1.7483  4.7704 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)  
## (Intercept)             9.0309     3.9953   2.260   0.0328 *
## padlet_mean            -0.2390     0.5596  -0.427   0.6729  
## pre_s_score            -0.2887     0.1992  -1.449   0.1598  
## spm_eng_bandB-range    -2.3469     1.7946  -1.308   0.2028  
## spm_eng_bandC/D-range  -3.2892     1.9684  -1.671   0.1072  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.003 on 25 degrees of freedom
## Multiple R-squared:  0.1086, Adjusted R-squared:  -0.03403 
## F-statistic: 0.7614 on 4 and 25 DF,  p-value: 0.5602

Outputting 95% CI for padlet score:

##      2.5 %     97.5 % 
## -1.3915156  0.9134816

Fitting a linear regression model for sensitivity robustness check with participants that recorded less than 3 timepoints:

## 
## Call:
## lm(formula = gain_s ~ padlet_mean + pre_s_score + spm_eng_band, 
##     data = df_s_p2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.6668 -1.7960  0.1483  1.8679  4.9434 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)  
## (Intercept)             9.5480     4.1113   2.322   0.0294 *
## padlet_mean            -0.2855     0.5740  -0.497   0.6236  
## pre_s_score            -0.3083     0.2065  -1.493   0.1490  
## spm_eng_bandB-range    -2.1229     1.8689  -1.136   0.2677  
## spm_eng_bandC/D-range  -3.6003     2.0631  -1.745   0.0943 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.062 on 23 degrees of freedom
## Multiple R-squared:  0.125,  Adjusted R-squared:  -0.02717 
## F-statistic: 0.8215 on 4 and 23 DF,  p-value: 0.5248

Step 10: Hypothesis-testing for H3b (parametric check, association between padlet post quality and pre-post writing change)

H3b: Shapiro-Wilk test for normality of residuals:
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(m_h3b)
## W = 0.98023, p-value = 0.8316
H3b: Non-Constant Variance (Breusch-Pagan) test:
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.1099283, Df = 1, p = 0.74023
H3b: Variance Inflation Factor (VIF) check:
##                  GVIF Df GVIF^(1/(2*Df))
## padlet_mean  1.261764  1        1.123283
## pre_w_score  2.182084  1        1.477188
## spm_eng_band 2.280277  2        1.228844

Fitting a linear regression model to expain writing score change using padlet score and SPM English grade:

## 
## Call:
## lm(formula = gain_w ~ padlet_mean + pre_w_score + spm_eng_band, 
##     data = df_w_p)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.8675 -1.6545 -0.1785  1.2300  4.7741 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)   
## (Intercept)             2.8624     3.2018   0.894  0.37985   
## padlet_mean             1.1127     0.5102   2.181  0.03882 * 
## pre_w_score            -0.6249     0.2174  -2.875  0.00815 **
## spm_eng_bandB-range    -4.3781     1.6729  -2.617  0.01484 * 
## spm_eng_bandC/D-range  -4.5238     1.8416  -2.456  0.02133 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.78 on 25 degrees of freedom
## Multiple R-squared:  0.3998, Adjusted R-squared:  0.3038 
## F-statistic: 4.163 on 4 and 25 DF,  p-value: 0.01015

Outputting 95% CI for padlet score:

##      2.5 %     97.5 % 
## 0.06187987 2.16349535

Fitting a linear regression model for sensitivity robustness check with participants that recorded less than 3 timepoints:

## 
## Call:
## lm(formula = gain_w ~ padlet_mean + pre_w_score + spm_eng_band, 
##     data = df_w_p2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.4084 -1.4596 -0.2434  1.4392  5.4064 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)   
## (Intercept)             2.3611     3.2152   0.734  0.47016   
## padlet_mean             1.1867     0.5111   2.322  0.02946 * 
## pre_w_score            -0.6114     0.2191  -2.791  0.01039 * 
## spm_eng_bandB-range    -4.7898     1.6986  -2.820  0.00971 **
## spm_eng_bandC/D-range  -4.3048     1.8798  -2.290  0.03152 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.77 on 23 degrees of freedom
## Multiple R-squared:  0.4378, Adjusted R-squared:   0.34 
## F-statistic: 4.477 on 4 and 23 DF,  p-value: 0.008028

Step 11: Selecting speaking test participants randomly for ICC analysis

We select randomly from each segment while adhering to stratification (4:4:7):

Step 12: Performing ICC analysis for speaking tests

ICC analysis output for pre-speaking test:

##  Single Score Intraclass Correlation
## 
##    Model: twoway 
##    Type : agreement 
## 
##    Subjects = 15 
##      Raters = 2 
##    ICC(A,1) = 0.895
## 
##  F-Test, H0: r0 = 0 ; H1: r0 > 0 
##    F(14,14) = 16.9 , p = 2.17e-06 
## 
##  95%-Confidence Interval for ICC Population Values:
##   0.714 < ICC < 0.963

ICC analysis output for post-speaking test:

##  Single Score Intraclass Correlation
## 
##    Model: twoway 
##    Type : agreement 
## 
##    Subjects = 15 
##      Raters = 2 
##    ICC(A,1) = 0.812
## 
##  F-Test, H0: r0 = 0 ; H1: r0 > 0 
##  F(14,4.17) = 17.1 , p = 0.00604 
## 
##  95%-Confidence Interval for ICC Population Values:
##   0.226 < ICC < 0.946

ICC analysis output for combined pre-post speaking tests:

##  Single Score Intraclass Correlation
## 
##    Model: twoway 
##    Type : agreement 
## 
##    Subjects = 30 
##      Raters = 2 
##    ICC(A,1) = 0.837
## 
##  F-Test, H0: r0 = 0 ; H1: r0 > 0 
##  F(29,18.8) = 13.2 , p = 1.64e-07 
## 
##  95%-Confidence Interval for ICC Population Values:
##   0.652 < ICC < 0.923

Examining pre and post speaking test score ranges:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.50    6.75   11.00   10.83   13.25   18.50
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.00    7.75   11.00   10.87   13.75   17.50
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00   10.50   13.50   13.40   17.25   20.50
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00    6.50   12.50   10.90   14.75   21.50

H1a sensitivity check: paired t-test using used scores

## 
##  Paired t-test
## 
## data:  df_s_rbchk_tt$post_s_used and df_s_rbchk_tt$pre_s_used
## t = 5.2019, df = 29, p-value = 1.447e-05
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  1.415945 3.250722
## sample estimates:
## mean difference 
##        2.333333

H1a sensitivity check: baseline-adjusted model using used scores

## 
## Call:
## lm(formula = gain_s_used ~ spm_eng_band + pre_s_used, data = df_s_rbchk_tt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1645 -1.5887  0.0267  1.5183  4.6469 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)  
## (Intercept)             6.8711     2.4835   2.767   0.0103 *
## spm_eng_bandB-range    -1.9676     1.3842  -1.421   0.1671  
## spm_eng_bandC/D-range  -3.2704     1.5609  -2.095   0.0460 *
## pre_s_used             -0.2497     0.1624  -1.538   0.1361  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.4 on 26 degrees of freedom
## Multiple R-squared:  0.1446, Adjusted R-squared:  0.04585 
## F-statistic: 1.464 on 3 and 26 DF,  p-value: 0.2472

Step 13: Selecting writing test participants randomly for ICC analysis

We select randomly from each segment while adhering to stratification (4:4:7):

Step 14: Performing ICC analysis for writing tests

ICC analysis output for pre-writing test:

##  Single Score Intraclass Correlation
## 
##    Model: twoway 
##    Type : agreement 
## 
##    Subjects = 15 
##      Raters = 2 
##    ICC(A,1) = 0.377
## 
##  F-Test, H0: r0 = 0 ; H1: r0 > 0 
##    F(14,15) = 2.28 , p = 0.0629 
## 
##  95%-Confidence Interval for ICC Population Values:
##   -0.112 < ICC < 0.73

ICC analysis output for post-writing test:

##  Single Score Intraclass Correlation
## 
##    Model: twoway 
##    Type : agreement 
## 
##    Subjects = 15 
##      Raters = 2 
##    ICC(A,1) = 0.826
## 
##  F-Test, H0: r0 = 0 ; H1: r0 > 0 
##  F(14,4.25) = 18.4 , p = 0.0048 
## 
##  95%-Confidence Interval for ICC Population Values:
##   0.264 < ICC < 0.95

ICC analysis output for combined pre-post writing tests:

##  Single Score Intraclass Correlation
## 
##    Model: twoway 
##    Type : agreement 
## 
##    Subjects = 30 
##      Raters = 2 
##    ICC(A,1) = 0.724
## 
##  F-Test, H0: r0 = 0 ; H1: r0 > 0 
##    F(29,14) = 7.83 , p = 0.000108 
## 
##  95%-Confidence Interval for ICC Population Values:
##   0.415 < ICC < 0.87

Examining pre and post writing test score ranges.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     3.0     5.0     4.4     5.0    12.0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.000   3.000   3.467   4.000  10.000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   4.000   5.000   6.667   8.000  15.000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.000   4.000   4.933   6.500  13.000

H1b sensitivity check: paired t-test using used scores

## 
##  Paired t-test
## 
## data:  df_w_rbchk_tt$post_w_used and df_w_rbchk_tt$pre_w_used
## t = 1.4157, df = 29, p-value = 0.1675
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.3557092  1.9557092
## sample estimates:
## mean difference 
##             0.8

H1b sensitivity check: baseline-adjusted model using used scores

## 
## Call:
## lm(formula = gain_w_used ~ spm_eng_band + pre_w_used, data = df_w_rbchk_tt)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.634 -1.525 -0.150  1.121  5.162 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)   
## (Intercept)             6.3365     2.2889   2.768  0.01025 * 
## spm_eng_bandB-range    -4.8037     1.6732  -2.871  0.00803 **
## spm_eng_bandC/D-range  -4.3535     1.8733  -2.324  0.02821 * 
## pre_w_used             -0.4332     0.2286  -1.895  0.06920 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.839 on 26 degrees of freedom
## Multiple R-squared:  0.2456, Adjusted R-squared:  0.1585 
## F-statistic: 2.821 on 3 and 26 DF,  p-value: 0.05856